R is a programming language designed to help you perform
statistical analysis, create graphics, and later on write your own
statistical software. R is becoming increasingly popular
and knowledge of R will help you on the job market. R is
probably the most versatile statistical tool out there (and it’s free
and open-source so you can literally use it anywhere). It is for example
used in all fields of academia, from biology to economics, and outside
academia including
RStudio is a great graphical user interface for R. In
recent years, a growing number of features have been added to this
graphical user interface, which makes it the preferred choice for
learning R, especially among beginners. You can think about
it as R being the engine of the car and RStudio being the
dashboard.
RStudio projects make it straightforward to divide your work into multiple contexts, each with its own working directory, workspace, history, and source documents. A project is basically a folder on your computer that holds all the files relevant to a particular piece of work. Working in RStudio Projects has multiple advantages:
R session
(process) is started. This makes sure that things you do in different
projects do not mess up.Git is a version control system that makes it easy to track changes
and work on code collaboratively. GitHub is a hosting service for
git. You can think of it as a public Dropbox for code but
on steroids. With version control, you will build your projects
step-by-step, be able to come back to any version of the project, and
accompany everything with human-readable messages.
As a student, you even get unlimited private repositories which you can use if you don’t feel like sharing your code with the rest of the world (yet). We will use private repositories to distribute code and assignments to you. And you will use it to keep track of your code and collaborate in teams.
With git, writing code for a project will look somewhat like this:
A Git repository is a space where you store and manage a project. It contains all of your project’s files and stores each file’s revision history. It’s common to refer to a repository as a repo.
We will you one repository for each lab and one repository for each homework assignment. You can directly import (“pull”) repositories via RStudio and save them on your computer. If you changed something in your project, you can easily upload (“push”) the new version to GitHub. GitHub will keep track of all changes you made over time within your project.
Our workflow will appear a bit tricky at the beginning but we are sure that you will handle it with ease very soon. We assume that by now you downloaded and installed R and Rstudio and have your personal GitHub account.
The course has its own page on GitHub, you can find it here: https://github.com/uni-mannheim-qm-2025. This is the place where you can find all relevant material for the lab sessions. It is also the place where you download and hand in your homework assignments.
So how does this work?
Go to https://github.com/uni-mannheim-qm-2025
and click on the repository for the current week (this week, this is
called week01_introduction). Now, click on the green
Clone or download button and select Use
HTTPS (this might already be selected by default, and if it is,
you’ll see the text Clone with HTTPS as in the image below). Click on
the clipboard icon to copy the repo URL.
File on the top bar and select
New Project....Version Control.Git.Repository URL window. Click on Browse to
select the folder on your computer where you want to store the
project.Create Project..Rmd file that is stored in the project (in
week 1, this is called QM2025_Week01.Rmd).The RStudio interface has four panes:
You have probably all tried out ChatGPT, and yes, it is impressive! Are you allowed to use ChatGPT and its alternatives for your assignments in this course? And if yes, do we encourage it? Here is our view on these questions:
Yes, you are allowed to use ChatGPT. In fact, we neither see a way to fully prevent you from using it, nor do we think that trying to do so would be reasonable.
In our assessment, large language models such as ChatGPT can be extremely helpful to those who know what they are doing, but they are not helpful at all if you have no idea what you are doing. That means: you yourself need a good understanding of what you want to do (this mostly refers to the lecture material), and a good understanding of how R works (this refers mostly to what you learn in this course) to productively work with ChatGPT. It will only be helpful to you if (1) you write precise prompts and (2) you are able to critically evaluate ChatGPT’s responses (and spot the errors it makes). This is only possible if you learn quantitative methods and R by yourself first. Because of that, our advice is the following:
Enough preparation, let’s finally dive into R!
R can perform basic math operations. Here are some examples:
1 + 1
## [1] 2
Some more calculations:
2 - 3
## [1] -1
4 * 5
## [1] 20
2^2
## [1] 4
4 / 2
## [1] 2
2^(1 / 2)
## [1] 1.414214
If you place parentheses correctly, R incorporates the order of operations.
((2 + 2) * 2)^2
## [1] 64
This should give the same result as before.
(4 * 2)^2
## [1] 64
But this of course gives a different result:
(2 + 2 * 2)^2
## [1] 36
You can also use other math functions you know from your calculator:
this is \(\sqrt{2}\)
sqrt(2)
## [1] 1.414214
when you do not specify the base, R uses the natural log with base \(e\), i.e. \(\log_e(10)\)
log(10)
## [1] 2.302585
but R can also use a different (virtually any) base, e.g. \(\log_{10}(10)\)
log(10, base = 10)
## [1] 1
or with base = 5, i.e. \(\log_5(10)\)
log(10, 5)
## [1] 1.430677
Pro tip: Always close your parentheses!
It is hard to understand pure code, especially for someone who did not write it (and future-you will also have a hard time to understand it).
Pro tip: Add comments to your code, describing what you are doing and why you are doing it.
With comments:
# symbol,# will be commented
out.# this is a comment
1 + 1 # This line runs
## [1] 2
# 1 + 1 This line does not run
Good coding style is like using correct punctuation.
Youcanmanagewithoutitbutitsuremakesthingseasiertoread.. – Hadley Wickham
But I already do have a calculator. Why do I need R?
R is so much more! R is an object-oriented programming language.
<- as assignment
operatorExamples:
lucky_number <- 7
# Now we created an (numeric) object called "lucky_number"
lucky_number
## [1] 7
The class() command lets us check the type of an
object:
lucky_number <-
class(lucky_number)
Let’s see how this works live, this time with a character object:
firstname <- "Domantas" # This is a character object
firstname
## [1] "Domantas"
class(firstname)
## [1] "character"
lastname <- "Undzėnas"
lastname
## [1] "Undzėnas"
Your turn: Here is your very first exercise!
Pro tip: Copy the lines of code that worked for something similar. Then, adjust the code according to your problem. That’s how coding works most of the time!
Create three objects:
1. `my_lucky_number` should contain your lucky number.
2. `my_firstname` should contain your firstname.
3. `my_lastname` should contain your lastname.
After you created the objects, call them separately. Don’t forget to add comments to your code.
What kind of data can I store in R? Different types of objects that can contain different types and sets of data:
We will go through all of these object types below. On top of that we will also learn how to calculate the measures of central tendency and variability with vectors.
Let’s start with vectors. We want a vector of the numbers 1, 2, 3, 4 and 5. How do I assign this set of numbers to a vector?
The c() function
combines single values to a vector:
example_vec <- c(1, 2, 3, 4, 5)
example_vec
## [1] 1 2 3 4 5
This also works for characters/strings:
country_code <- c("DE", "FR", "NL", "US", "UK")
country_code
## [1] "DE" "FR" "NL" "US" "UK"
And it also works for a combination of numbers and characters:
example_vec2 <- c("Welcome", "to", "the", "lab", "in", "A", 5, "or", "B", 6)
example_vec2
## [1] "Welcome" "to" "the" "lab" "in" "A" "5"
## [8] "or" "B" "6"
What if we start with numbers?
example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")
example_vec3
## [1] "1" "2" "3" "4" "5"
## [6] "R can count!"
Note that if you have a character field in your vector, R will turn ALL values into character data! (You can see that by the quotes around the values)
Let’s check the type of data by using the class()
command on example_vec3.
example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")
class(example_vec3)
## [1] "character"
You can use mathematical functions on each element in numeric vectors/matrices etc.
example_vec <- c(1, 2, 3, 4, 5)
sqrt(example_vec) # Take the square root of each element in example_vec
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
What about multiplication?
example_vec <- c(1, 2, 3, 4, 5)
example_vec * 10
## [1] 10 20 30 40 50
There are also some functions that you can use on the whole vector.
example_vec <- c(1, 2, 3, 4, 5)
sum(example_vec) # Question: What does sum() do?
## [1] 15
length(example_vec) # Question: What does length() do?
## [1] 5
Matrices in R are two-dimensional table objects. In R, matrices are always row by column. Like roller coaster, Roman Catholic or Ray Charles).
In a matrix, all data must be of the same type. If you mix numeric and character entries, the matrix will be all character just like in a vector.
How do I create a matrix in R?
example_mat1 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2
)
example_mat1 # How did R fill the numbers in the matrix?
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
You could also change the options an let R fill the matrix by rows (instead of columns):
example_mat2 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2,
byrow = T
)
example_mat2 # See the difference?
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
Or you could create a matrix from different vectors by using
column-bind on two or more vectors. It works similar to the
c() function but with vectors as input instead of
scalars.
Let’s first create two vectors of the same length:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
# And now column-bind - cbind() - the two vectors.
example_mat3 <- cbind(vec1, vec2)
example_mat3
## vec1 vec2
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
Similarly, we can row-bind – rbind() – the two
vectors:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
example_mat4 <- rbind(vec1, vec2)
example_mat4
## [,1] [,2] [,3] [,4] [,5] [,6]
## vec1 1 2 3 4 5 6
## vec2 7 8 9 10 11 12
Data frames are two-dimensional table objects, just like matrices. Most data you will analyze in R will be in this form.
You can create data frames from vectors just like
cbind() using data.frame():
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
example_df1 <- data.frame(vec1, vec2)
example_df1
## vec1 vec2
## 1 1 7
## 2 2 8
## 3 3 9
## 4 4 10
## 5 5 11
## 6 6 12
However, data frames are always column-bound vectors. In a data frame, everything within a column has to be of the same data type. But you can mix data types between columns:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df2 <- data.frame(vec1, vec2, vec3)
example_df2
## vec1 vec2 vec3
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
You can also name your columns/variables. Either when creating your data frame:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df3 <- data.frame(
variable_1to6 = vec1,
variable_7to12 = vec2,
variable_rows = vec3
)
example_df3
## variable_1to6 variable_7to12 variable_rows
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
Or by renaming an existing data frame.
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df3 <- data.frame(vec1, vec2, vec3)
# Rename the variables of an existing data frame
names(example_df3) <- c("variable.1", "variable.2", "variable.3")
example_df3
## variable.1 variable.2 variable.3
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
We can also add a new variable to an existing data frame. We simply create a data frame which consists of a data frame and a vector:
example_df4 <-
data.frame(example_df3,
variable_4 = c(90, 91, 92, 93, 94, 95))
example_df4
## variable.1 variable.2 variable.3 variable_4
## 1 1 7 First Row 90
## 2 2 8 Second Row 91
## 3 3 9 Third Row 92
## 4 4 10 Fourth Row 93
## 5 5 11 Fifth Row 94
## 6 6 12 Sixth Row 95
These are like matrices, except that they are typically three-dimensional. You’re not going to see many of these, but we’ll introduce them for completeness. Here is an illustration of what a three-dimensional array could look like:
You can think of 10 3 x 5 bingo cards, that all display spaces 1 through 15 for example, as an array. If I were to display that in R, I would use the array function to write:
bingo_array <- array(seq(1, 15, 1),
dim = c(3, 5, 10))
bingo_array
The general syntax for this function is
array(values you want to array, dim = (row, column, height)).
List objects can contain a series of the other objects we just learned about. A single list can contain a value, a vector, a matrix, AND a dataframe - or many of each!
How do I make a list?
Use the list()
function!
# create a vector
example_vec <- c(1, 2, 3, 4, 5, 6, 7, 8)
# create a matrix
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
# create an array
example_array <- array(seq(1, 15, 1), dim = c(3, 5, 10))
example_vec3 <- c(1, 2, 3, 4)
## Store all objects in a list
example_list <- list(example_vec, example_mat, example_array)
example_list
Sometimes we want to select single or multiple data entries from our
objects. We can do this by selecting elements via [].
Let’s first do it with a vector. Remember our country_code vector?
country_code <- c("DE", "FR", "NL", "US", "UK")
country_code
## [1] "DE" "FR" "NL" "US" "UK"
Let’s say we only want to select the “US”. We can achieve this by accessing the value via its position in the vector:
country_code[4]
## [1] "US"
Now we want to select all values but the “US”:
country_code[-4]
## [1] "DE" "FR" "NL" "UK"
You can pass multiple indexes as a vector:
country_code[c(1, 2, 3)]
## [1] "DE" "FR" "NL"
1:3 generates the vector c(1, 2, 3) a bit
quicker.
country_code[1:3]
## [1] "DE" "FR" "NL"
If we want the values “DE”, “FR”, and “US”, a sequence does not really help. But we can put a vector with a combination of a sequence and some other values in the square brackets:
country_code[c(1:2, 4)]
## [1] "DE" "FR" "US"
We can access values of a matrix similarly. However, we need to think of one additional dimension.
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
example_mat
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Generally, we type object[row, column] to access
specific rows and columns. To see how this works, let’s have a look at
our example_mat:
Now we want to access the value 6. It’s in the third row and the second column.
example_mat[3, 2]
## [1] 6
We could also select an entire column (and use it like a vector).
example_mat[, 2]
## [1] 4 5 6
By accessing values with the [] square brackets, we
could also replace values. Let’s say we want to recode the entire first
column in example_mat3 to 99:
example_mat[, 1] <- 99
example_mat
## [,1] [,2]
## [1,] 99 4
## [2,] 99 5
## [3,] 99 6
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
example_mat[, 1] <- 99
# And we want to recode the first and the third value in the second column
# to 91 and 100
example_mat[c(1, 3), 2] <- c(91, 100)
example_mat
## [,1] [,2]
## [1,] 99 91
## [2,] 99 5
## [3,] 99 100
This is a good start to select and recode data in an object. However, it might be a bit exhausting (maybe even impossible) to always look up the exact position in the object.
Luckily, R let’s us also select elements based on conditions. Instead of the position we put a condition in the [] square brackets.
==!=<><=>=&|So how do conditions work? Let’s create a matrix to work with
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
# And now column-bind (cbind()) the two vectors.
example_mat <- cbind(vec1, vec2)
example_mat
## vec1 vec2
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
example_mat > 9 # This returns TRUE or FALSE for each value in the object.
## vec1 vec2
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE TRUE
## [5,] FALSE TRUE
## [6,] FALSE TRUE
Now if we put this condition in square brackets we get the values for which the condition is true.
example_mat[example_mat > 9]
## [1] 10 11 12
Here comes the second round of exercises:
Create two vectors vec1 and vec2.
vec1 should contain 1, 56, 23, 89, -3 and 5 (in that
order).vec2 contains 24, 78, 32, 27, 8 and 1.Now select elements of vec1 that are greater than 5
or smaller than 0.
Next set vec1 to zero if vec2 is
greater than 30 and smaller or equal to 32.
Please solve all three steps in the next code chunk.
Working with data frames is similar to working with matrices and vectors.
Usually (and especially for this class) we want to work with existing
datasets. R knows and can load most of the standard formats of datasets,
like .csv, .xlsx (Excel), .dta
(Stata), .sav (SPSS) and many more.
So far we only used R’s base functions. In order to use some more sophisticated or special R functions, we need to load libraries or packages first. Think of these libraries as extra apps that you can load on your smartphones to extend its functionality.
Right now, we want to load the dataset. In order to use the standard but foreign datasets we need the foreign package.
First, we want to have a look at what the package can do.
packageDescription("foreign")
## Package: foreign
## Priority: recommended
## Version: 0.8-86
## Date: 2023-11-26
## Title: Read Data Stored by 'Minitab', 'S', 'SAS', 'SPSS', 'Stata',
## 'Systat', 'Weka', 'dBase', ...
## Depends: R (>= 4.0.0)
## Imports: methods, utils, stats
## Authors@R: c( person("R Core Team", email = "R-core@R-project.org",
## role = c("aut", "cph", "cre")), person("Roger", "Bivand", role
## = c("ctb", "cph")), person(c("Vincent", "J."), "Carey", role =
## c("ctb", "cph")), person("Saikat", "DebRoy", role = c("ctb",
## "cph")), person("Stephen", "Eglen", role = c("ctb", "cph")),
## person("Rajarshi", "Guha", role = c("ctb", "cph")),
## person("Swetlana", "Herbrandt", role = "ctb"),
## person("Nicholas", "Lewin-Koh", role = c("ctb", "cph")),
## person("Mark", "Myatt", role = c("ctb", "cph")),
## person("Michael", "Nelson", role = "ctb"), person("Ben",
## "Pfaff", role = "ctb"), person("Brian", "Quistorff", role =
## "ctb"), person("Frank", "Warmerdam", role = c("ctb", "cph")),
## person("Stephen", "Weigand", role = c("ctb", "cph")),
## person("Free Software Foundation, Inc.", role = "cph"))
## Contact: see 'MailingList'
## Copyright: see file COPYRIGHTS
## Description: Reading and writing data stored by some versions of 'Epi
## Info', 'Minitab', 'S', 'SAS', 'SPSS', 'Stata', 'Systat',
## 'Weka', and for reading and writing some 'dBase' files.
## ByteCompile: yes
## Biarch: yes
## License: GPL (>= 2)
## BugReports: https://bugs.r-project.org
## MailingList: R-help@r-project.org
## URL: https://svn.r-project.org/R-packages/trunk/foreign/
## NeedsCompilation: yes
## Packaged: 2023-11-26 16:54:35 UTC; ripley
## Author: R Core Team [aut, cph, cre], Roger Bivand [ctb, cph], Vincent
## J. Carey [ctb, cph], Saikat DebRoy [ctb, cph], Stephen Eglen
## [ctb, cph], Rajarshi Guha [ctb, cph], Swetlana Herbrandt [ctb],
## Nicholas Lewin-Koh [ctb, cph], Mark Myatt [ctb, cph], Michael
## Nelson [ctb], Ben Pfaff [ctb], Brian Quistorff [ctb], Frank
## Warmerdam [ctb, cph], Stephen Weigand [ctb, cph], Free Software
## Foundation, Inc. [cph]
## Maintainer: R Core Team <R-core@R-project.org>
## Repository: CRAN
## Date/Publication: 2023-11-28 06:42:13 UTC
## Built: R 4.4.1; x86_64-w64-mingw32; 2024-06-14 08:34:00 UTC; windows
## Archs: x64
##
## -- File: C:/Program Files/R/R-4.4.1/library/foreign/Meta/package.rds
# Ok this seems to be useful. So let's load the package to use it.
library(foreign)
You will often come across datasets which are stored as Stata data
files. Those files have the extension .dta.
Right now, we want to load the data set called
weather_data_germany_2023.dta which is already stored the
raw_data folder in our directory:
weather_data <- read.dta("raw_data/weather_data_germany_2023.dta")
The data contains yearly temperature averages of German cities as well as their geographical location (longitude and latitude). It comes from the “Deutscher Wetterdienst” and you can find it here. Now that we have loaded the data, we can have a look at it.
With head()we can look at the first six rows of the data
set:
head(weather_data)
## city longitude latitude mean_temp
## 1 Sigmarszell-Zeisertsweiler 9.740446 47.57760 11.14
## 2 Obersulm-Willsbach 9.352493 49.12801 12.28
## 3 Röllbach 9.253038 49.76440 11.37
## 4 Padenstedt (Pony-Park) 9.925507 54.01884 10.22
## 5 Elzach-Fisnacht 8.108840 48.20121 11.32
## 6 Lippspringe, Bad 8.838795 51.78542 11.12
But we can also look at the entire data set:
weather_data
If we only want to look at the variable names, we can use
names():
names(weather_data)
## [1] "city" "longitude" "latitude" "mean_temp"
Now we can use our selecting abilities on a data frame. As before we can select elements via their numeric position:
weather_data[1, 2] # first row, second column
## [1] 9.740446
weather_data[1:3, 1] # rows 1-3, first column
## [1] "Sigmarszell-Zeisertsweiler" "Obersulm-Willsbach"
## [3] "Röllbach"
Additionally, as columns usually have names in data frames, we can use the column names to select values in two ways.
First, we can put the column name in square brackets instead of a column number:
weather_data[1, "city"]
## [1] "Sigmarszell-Zeisertsweiler"
weather_data[, "mean_temp"]
We can also look at two variables at once:
weather_data[, c("city", "mean_temp")]
Second, we can also select an entire column by using the
$ operator with the column name:
data.frame_name$column_name. Just like this:
weather_data$mean_temp
## [1] 11.14 12.28 11.37 10.22 11.32 11.12 10.73 11.12 9.07 9.83 10.97 10.35
## [13] 10.49 10.06 10.48 10.09 7.69 10.89 10.72 10.39 11.70 10.56 12.43 11.26
## [25] 12.13 10.13 9.90 11.71 10.52 9.95 11.55 10.94 8.83 11.40 10.63 10.55
## [37] 10.51 11.19 9.90 10.70 9.67 12.31 11.44 10.69 10.69 9.83 11.29 10.35
## [49] 10.10 11.60 9.85 11.38 10.17 9.51 10.25 9.42 10.03 10.32 8.31 10.29
## [61] 9.50 11.41 9.73 10.79 10.69 9.40 10.08 7.88 10.26 11.35 12.79 11.12
## [73] 10.37 9.04 8.61 10.71 10.48 10.15 12.02 7.26 11.72 10.60 11.10 10.01
## [85] 10.39 10.34 10.52 8.52 11.59 7.12 8.82 10.50 10.16 10.11 9.75 10.22
## [97] 10.96 12.55 11.27 10.90 11.14 10.87 10.29 10.67 11.14 10.39 11.03 8.85
## [109] 10.78 7.67 10.62 10.37 11.67 10.78 10.70 10.04 8.79 13.14 9.99 10.36
## [121] 11.21 10.66 10.43 12.41 12.09 11.14 12.83 11.66 10.38 10.80 10.26 11.41
## [133] 10.25 10.90 10.90 9.73 11.23 10.58 9.66 10.78 9.89 10.98 10.16 10.43
## [145] 10.88 11.24 10.87 12.24 9.93 9.73 11.37 10.85 10.76 10.23 11.56 12.06
## [157] 8.29 11.23 10.57 12.17 11.04 4.76 10.73 11.79 10.56 10.69 10.53 10.61
## [169] 10.76 7.94 10.61 10.47 11.15 10.49 10.62 11.24 10.64 11.23 12.01 8.71
## [181] 12.45 12.31 10.79 10.14 10.83 10.38 10.74 10.31 9.28 11.03 9.46 10.60
## [193] 10.19 10.56 11.41 8.67 10.92 10.57 10.33 10.75 10.52 10.59 11.64 5.48
## [205] 11.52 10.07 10.56 10.15 11.62 10.98 11.85 10.42 10.05 10.59 10.28 11.32
## [217] 9.71 11.64 9.43 10.10 11.98 11.14 -2.90 10.99 10.09 10.58 11.81 11.15
## [229] 10.01 12.31 10.33 10.35 11.19 11.35 8.57 11.18 9.70 10.11 8.93 11.22
## [241] 12.32 10.30 10.34 10.65 11.31 11.96 11.04 10.22 10.64 10.24 10.03 9.40
## [253] 10.92 11.24 11.08 10.46 11.69 11.16 9.93 9.89 12.94 11.19 10.58 10.30
## [265] 11.08 10.34 9.96 10.49 10.36 10.66 11.05 10.19 11.06 10.47 10.25 10.57
## [277] 10.96 11.05 12.46 9.97 11.38 10.63 11.14 10.23 11.13 11.38 11.83 9.89
## [289] 12.41 11.08 10.90 9.50 10.07 11.67 11.82 8.96 10.38 11.50 10.54 10.72
## [301] 6.88 10.66 9.29 10.58 10.26 12.27 10.23 10.99 10.52 11.10 9.80 11.57
## [313] 10.44 10.82 11.13 10.87 11.18 10.16 10.03 9.46 9.28 10.89 12.83 10.05
## [325] 10.72 12.00 10.59 12.15 10.42 11.68 11.07 7.30 11.32 11.17 10.85 9.84
## [337] 10.39 10.99 10.93 8.73 11.31 11.44 8.45 11.41 10.30 10.38 9.19 9.88
## [349] 9.81 11.69 10.50 9.26 10.39 12.68 10.19 10.85 6.82 10.23 10.38 10.94
## [361] 10.28 9.28 10.81 12.30 10.19 11.35 12.03 10.09 10.97 10.97 11.40 12.67
## [373] 10.02 12.16 10.64 9.65 11.02 10.91 10.49 4.91 10.08 11.19 10.58 10.74
## [385] 11.18 10.89 11.76 11.69 10.61 10.26 10.46 11.52 9.45 9.96 10.70 9.97
## [397] 10.16 11.12 11.32 10.31 12.80 10.56 9.91 12.71 10.28 10.94 12.11 10.57
## [409] 11.82 11.42 10.78 10.36 8.21 10.72 10.40 10.26 10.19 10.45 11.62 11.20
## [421] 10.73 11.20 12.33 5.21 11.32 12.02 10.33 10.73 11.50 10.79 10.83 11.23
## [433] 10.45 8.92 11.07 11.55 10.82 11.19 9.13 10.20 10.84 9.16 8.66 11.11
## [445] 9.96 10.69 10.73 10.73 12.60 11.21 10.02 11.08 10.97 12.68
Columns from data frames are essentially vectors. We can use all the operations and functions we can use for vectors (depending on their class.)
weather_data$mean_temp[1] # For example, we can select an element of the vector
## [1] 11.14
What if we want to add a new variable? Let’s create a variable named “cold”.
weather_data$cold <- 0
# What does this do?
weather_data$cold
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0
Now, we want to recode “cold” to 1 for cities whose mean temperature is lower than 8 degrees Celsius.
weather_data$cold <- 0
weather_data$cold[weather_data$mean_temp < 8] <- 1
# Let's have a look at both variables:
weather_data[, c("city", "mean_temp", "cold")]
## city mean_temp cold
## 1 Sigmarszell-Zeisertsweiler 11.14 0
## 2 Obersulm-Willsbach 12.28 0
## 3 Röllbach 11.37 0
## 4 Padenstedt (Pony-Park) 10.22 0
## 5 Elzach-Fisnacht 11.32 0
## 6 Lippspringe, Bad 11.12 0
## 7 Ummendorf 10.73 0
## 8 Tholey 11.12 0
## 9 Garmisch-Partenkirchen 9.07 0
## 10 Veilsdorf 9.83 0
## 11 Wernigerode 10.97 0
## 12 Pelzerhaken 10.35 0
## 13 Balingen-Bronnhaupten 10.49 0
## 14 Kronach 10.06 0
## 15 Heckelberg 10.48 0
## 16 Kaisersbach-Cronhütte 10.09 0
## 17 Kleiner Inselsberg 7.69 1
## 18 Starkenberg-Tegkwitz 10.89 0
## 19 Schwandorf 10.72 0
## 20 Quickborn 10.39 0
## 21 Darmstadt 11.70 0
## 22 Staffelstein, Bad-Stublang 10.56 0
## 23 Geisenheim 12.43 0
## 24 Rahden-Kleinendorf 11.26 0
## 25 Heinsberg-Schleiden 12.13 0
## 26 Eichstätt-Landershofen 10.13 0
## 27 Parsberg/Oberpfalz-Eglwang 9.90 0
## 28 Perl-Nennig 11.71 0
## 29 Warburg 10.52 0
## 30 Altheim, Kreis Biberach 9.95 0
## 31 Friedrichshafen-Unterraderach 11.55 0
## 32 Wangerland-Hooksiel 10.94 0
## 33 Lenzkirch-Ruhbühl 8.83 0
## 34 Neunkirchen-Wellesweiler 11.40 0
## 35 Boizenburg 10.63 0
## 36 Leuchtturm Kiel 10.55 0
## 37 Rosengarten-Klecken 10.51 0
## 38 Artern 11.19 0
## 39 Barth 9.90 0
## 40 Schlüchtern-Herolz 10.70 0
## 41 Neustadt am Kulm-Filchendorf 9.67 0
## 42 Düsseldorf 12.31 0
## 43 Freudenberg/Main-Boxtal 11.44 0
## 44 Weißenburg-Emetzheim 10.69 0
## 45 Querfurt-Mühle Lodersleben 10.69 0
## 46 Oberhaching-Laufzorn 9.83 0
## 47 Wusterwitz 11.29 0
## 48 Königshofen, Bad 10.35 0
## 49 Ostenfeld (Rendsburg) 10.10 0
## 50 Wuppertal-Buchenhofen 11.60 0
## 51 Karlshagen 9.85 0
## 52 Wolfach 11.38 0
## 53 Martinroda 10.17 0
## 54 Oberviechtach 9.51 0
## 55 Hasenkrug-Hardebek 10.25 0
## 56 Waldmünchen 9.42 0
## 57 Schorndorf-Knöbling 10.03 0
## 58 Blankenrath 10.32 0
## 59 Birx/Rhön 8.31 0
## 60 Aue 10.29 0
## 61 Kaufbeuren-Oberbeuren 9.50 0
## 62 Pirmasens 11.41 0
## 63 Stötten 9.73 0
## 64 Görlitz 10.79 0
## 65 Waldems-Reinborn 10.69 0
## 66 Pfullendorf 9.40 0
## 67 Neubulach-Oberhaugstett 10.08 0
## 68 Kleiner Feldberg/Taunus 7.88 1
## 69 Trollenhagen 10.26 0
## 70 Bernburg/Saale (Nord) 11.35 0
## 71 Lahr 12.79 0
## 72 Cölbe, Kr. Marburg-Biedenkopf 11.12 0
## 73 Steinau, Kr. Cuxhaven 10.37 0
## 74 Lobenstein, Bad 9.04 0
## 75 Oberstdorf 8.61 0
## 76 Göttingen 10.71 0
## 77 Mühldorf 10.48 0
## 78 Erfde 10.15 0
## 79 Königswinter-Heiderhof 12.02 0
## 80 Wasserkuppe 7.26 1
## 81 Borken in Westfalen 11.72 0
## 82 Müncheberg 10.60 0
## 83 Bremen 11.10 0
## 84 Kiefersfelden-Gach 10.01 0
## 85 Grambek 10.39 0
## 86 Lichtenhain-Mittelndorf 10.34 0
## 87 Erfurt-Weimar 10.52 0
## 88 Oberharz am Brocken-Stiege 8.52 0
## 89 Trier-Petrisberg 11.59 0
## 90 Kahler Asten 7.12 1
## 91 Schneifelforsthaus 8.82 0
## 92 Chieming 10.50 0
## 93 Moringen-Lutterbeck 10.16 0
## 94 Stechlin-Menz 10.11 0
## 95 Kempten 9.75 0
## 96 Wittstock-Rote Mühle 10.22 0
## 97 Großenkneten 10.96 0
## 98 Müllheim 12.55 0
## 99 Möhrendorf-Kleinseebach 11.27 0
## 100 Landshut-Reithof 10.90 0
## 101 Belm 11.14 0
## 102 Klipphausen-Garsebach 10.87 0
## 103 Grünow 10.29 0
## 104 Michelstadt-Vielbrunn 10.67 0
## 105 Potsdam 11.14 0
## 106 Weihenstephan-Dürnast 10.39 0
## 107 Doberlug-Kirchhain 11.03 0
## 108 Zwiesel 8.85 0
## 109 Wittingen-Vorhop 10.78 0
## 110 Deutschneudorf-Brüderwiese 7.67 1
## 111 Sankt Peter-Ording 10.62 0
## 112 Marnitz 10.37 0
## 113 Michelstadt 11.67 0
## 114 Kissingen, Bad 10.78 0
## 115 Ruppertsecken 10.70 0
## 116 Plauen 10.04 0
## 117 Elster, Bad-Sohl 8.79 0
## 118 Waghäusel-Kirrlach 13.14 0
## 119 Feuchtwangen-Heilbronn 9.99 0
## 120 Lennestadt-Theten 10.36 0
## 121 Berlin Brandenburg 11.21 0
## 122 Muskau, Bad 10.66 0
## 123 Waltershausen 10.43 0
## 124 Kahl/Main 12.41 0
## 125 Geldern-Walbeck 12.09 0
## 126 Berlin-Dahlem (FU) 11.14 0
## 127 Mannheim 12.83 0
## 128 Würzburg 11.66 0
## 129 Ueckermünde 10.38 0
## 130 Naumburg/Saale-Kreipitzsch 10.80 0
## 131 Hermaringen-Allewind 10.26 0
## 132 Aachen-Orsbach 11.41 0
## 133 Hohwacht 10.25 0
## 134 Baruth 10.90 0
## 135 Helmstedt-Emmerstedt 10.90 0
## 136 Ulm-Mähringen 9.73 0
## 137 Hannover 11.23 0
## 138 Altomünster-Maisbrunn 10.58 0
## 139 Eslohe 9.66 0
## 140 Fritzlar/Eder 10.78 0
## 141 Feldberg/Mecklenburg 9.89 0
## 142 Leuchtturm Alte Weser 10.98 0
## 143 Greifswald 10.16 0
## 144 Idar-Oberstein 10.43 0
## 145 Krölpa-Rockendorf 10.88 0
## 146 Schwäbisch Gmünd-Weiler 11.24 0
## 147 Lenzen/Elbe 10.87 0
## 148 Andernach 12.24 0
## 149 Tribsees 9.93 0
## 150 Schleiz 9.73 0
## 151 Mühlacker 11.37 0
## 152 Hümmerich 10.85 0
## 153 Dillingen/Donau-Fristingen 10.76 0
## 154 Dörnick 10.23 0
## 155 Pforzheim-Ispringen 11.56 0
## 156 Bochum 12.06 0
## 157 Braunlage 8.29 0
## 158 Dörpen 11.23 0
## 159 Amberg-Unterammersricht 10.57 0
## 160 Sachsenheim 12.17 0
## 161 Seehausen 11.04 0
## 162 Großer Arber 4.76 1
## 163 Lohr/Main-Halsbach 10.73 0
## 164 Eppingen-Elsenz 11.79 0
## 165 Oberzent-Beerfelden 10.56 0
## 166 Reichshof-Eckenhagen 10.69 0
## 167 Neuburg/Kammel-Langenhaslach 10.53 0
## 168 Schwerin 10.61 0
## 169 Weimar-Schöndorf 10.76 0
## 170 Wernigerode-Schierke 7.94 1
## 171 Geringswalde-Altgeringswalde 10.61 0
## 172 Schmieritz-Weltwitz 10.47 0
## 173 Helgoland 11.15 0
## 174 Gottfrieding 10.49 0
## 175 Kirchdorf/Poel 10.62 0
## 176 Berus 11.24 0
## 177 Trostberg 10.64 0
## 178 Dachwig 11.23 0
## 179 Metzingen 12.01 0
## 180 Marienberg 8.71 0
## 181 Duisburg-Baerl 12.45 0
## 182 Stuttgart (Schnarrenberg) 12.31 0
## 183 Hechingen 10.79 0
## 184 Hahn 10.14 0
## 185 Reimlingen 10.83 0
## 186 Mallersdorf-Pfaffenberg-Oberlindhart 10.38 0
## 187 Hersfeld, Bad 10.74 0
## 188 Itzehoe 10.31 0
## 189 Merklingen 9.28 0
## 190 Lübben-Blumenfelde 11.03 0
## 191 Prackenbach-Neuhäusl 9.46 0
## 192 Kirchberg/Jagst-Herboldshausen 10.60 0
## 193 Ebrach 10.19 0
## 194 München-Flughafen 10.56 0
## 195 Jeßnitz 11.41 0
## 196 Reit im Winkl 8.67 0
## 197 Wiesbaden-Auringen 10.92 0
## 198 Wendisch Evern 10.57 0
## 199 Wacken 10.33 0
## 200 Hamburg-Fuhlsbüttel 10.75 0
## 201 Donauwörth-Osterweiler 10.52 0
## 202 Bremervörde 10.59 0
## 203 Buchenbach 11.64 0
## 204 Feldberg/Schwarzwald 5.48 1
## 205 Köthen (Anhalt) 11.52 0
## 206 Heinersreuth-Vollhof 10.07 0
## 207 Zehdenick 10.56 0
## 208 Gräfenberg-Kasberg 10.15 0
## 209 Wunstorf 11.62 0
## 210 Berge 10.98 0
## 211 Kitzingen 11.85 0
## 212 Teterow 10.42 0
## 213 Manderscheid-Sonnenhof 10.05 0
## 214 Renningen-Ihinger Hof 10.59 0
## 215 Twistetal-Mühlhausen 10.28 0
## 216 Bevern, Kr. Holzminden 11.32 0
## 217 Meiningen 9.71 0
## 218 Kleve 11.64 0
## 219 Sohland/Spree 9.43 0
## 220 Langenwetzendorf-Göttendorf 10.10 0
## 221 Weilerswist-Lommersum 11.98 0
## 222 Osterfeld 11.14 0
## 223 Zugspitze -2.90 1
## 224 Friesoythe-Altenoythe 10.99 0
## 225 Leinefelde 10.09 0
## 226 Freiburg/Elbe 10.58 0
## 227 Waltrop-Abdinghof 11.81 0
## 228 Salzuflen, Bad 11.15 0
## 229 Berka, Bad (Flugplatz) 10.01 0
## 230 Tönisvorst 12.31 0
## 231 Chemnitz 10.33 0
## 232 Goldberg 10.35 0
## 233 Nürnberg 11.19 0
## 234 Barsinghausen-Hohenbostel 11.35 0
## 235 Berleburg, Bad-Stünzel 8.57 0
## 236 Norderney 11.18 0
## 237 Kall-Sistig 9.70 0
## 238 Lüdenscheid 10.11 0
## 239 Klippeneck 8.93 0
## 240 Braunschweig 11.22 0
## 241 Saarbrücken-Burbach 12.32 0
## 242 Alsfeld-Eifa 10.30 0
## 243 Schwarzburg 10.34 0
## 244 Wiesenburg 10.65 0
## 245 Leipzig/Halle 11.31 0
## 246 Nauheim, Bad 11.96 0
## 247 Harzburg, Bad 11.04 0
## 248 Grambow-Schwennenz 10.22 0
## 249 Uelzen 10.64 0
## 250 Amerang-Pfaffing 10.24 0
## 251 Holzkirchen 10.03 0
## 252 Villingen-Schwenningen 9.40 0
## 253 Bamberg 10.92 0
## 254 Wittenberg 11.24 0
## 255 Möckern-Drewitz 11.08 0
## 256 Fehmarn 10.46 0
## 257 Lippstadt-Bökenförde 11.69 0
## 258 Singen 11.16 0
## 259 Arkona 9.93 0
## 260 Meinerzhagen-Redlendorf 9.89 0
## 261 Freiburg 12.94 0
## 262 Gevelsberg-Oberbröking 11.19 0
## 263 Lichtentanne 10.58 0
## 264 Schauenburg-Elgershausen 10.30 0
## 265 Schonungen-Mainberg 11.08 0
## 266 Lautertal-Oberlauter 10.34 0
## 267 Pommelsbrunn-Mittelburg 9.96 0
## 268 Deuselbach 10.49 0
## 269 Herzberg 10.36 0
## 270 Aldersbach-Kramersepp 10.66 0
## 271 Lindenberg 11.05 0
## 272 Gelbelsee 10.19 0
## 273 Alfhausen 11.06 0
## 274 Augsburg 10.47 0
## 275 Metten 10.25 0
## 276 Kyritz 10.57 0
## 277 Gardelegen 10.96 0
## 278 Eschwege 11.05 0
## 279 Frankfurt/Main 12.46 0
## 280 Memmingen 9.97 0
## 281 Klitzschen bei Torgau 11.38 0
## 282 Straubing 10.63 0
## 283 Wolfsburg (Südwest) 11.14 0
## 284 Burgwald-Bottendorf 10.23 0
## 285 Kubschütz, Kr. Bautzen 11.13 0
## 286 Notzingen 11.38 0
## 287 Essen-Bredeney 11.83 0
## 288 Saldenburg-Entschenreuth 9.89 0
## 289 Worms 12.41 0
## 290 Bassum 11.08 0
## 291 Manschnow 10.90 0
## 292 Neukirchen-Hauptschwenda 9.50 0
## 293 Schleswig 10.07 0
## 294 Jena (Sternwarte) 11.67 0
## 295 Waibstadt 11.82 0
## 296 Oy-Mittelberg-Petersthal 8.96 0
## 297 Lübeck-Blankensee 10.38 0
## 298 Neunkirchen-Seelscheid-Krawinkel 11.50 0
## 299 Neuruppin-Alt Ruppin 10.54 0
## 300 Seesen 10.72 0
## 301 Zinnwald-Georgenfeld 6.88 1
## 302 Mittelnkirchen-Hohenfelde 10.66 0
## 303 Tirschenreuth-Lodermühl 9.29 0
## 304 Soltau 10.58 0
## 305 Piding 10.26 0
## 306 Emmendingen-Mundingen 12.27 0
## 307 Hattstedt 10.23 0
## 308 Berlin-Buch 10.99 0
## 309 Ellwangen-Rindelbach 10.52 0
## 310 Genthin 11.10 0
## 311 Putbus 9.80 0
## 312 München-Stadt 11.57 0
## 313 Salzungen, Bad-Gräfen-Nitzendorf 10.44 0
## 314 Langenlipsdorf 10.82 0
## 315 Aschersleben-Mehringen 11.13 0
## 316 Rothenburg ob der Tauber 10.87 0
## 317 Holzdorf-Bernsdorf 11.18 0
## 318 Schönhagen (Ostseebad) 10.16 0
## 319 Sandberg 10.03 0
## 320 Leutkirch-Herlazhofen 9.46 0
## 321 Grainet-Rehberg 9.28 0
## 322 Simbach/Inn 10.89 0
## 323 Ohlsbach 12.83 0
## 324 Bertsdorf-Hörnitz 10.05 0
## 325 Worpswede-Hüttenbusch 10.72 0
## 326 Schaafheim-Schlierbach 12.00 0
## 327 Gera-Leumnitz 10.59 0
## 328 Offenbach-Wetterpark 12.15 0
## 329 Günzburg 10.42 0
## 330 Dresden-Hosterwitz 11.68 0
## 331 Cuxhaven 11.07 0
## 332 Neuhaus am Rennweg 7.30 1
## 333 Hameln-Hastenbeck 11.32 0
## 334 Borkum-Flugplatz 11.17 0
## 335 Rostock-Warnemünde 10.85 0
## 336 Steinhagen-Negast 9.84 0
## 337 Weinbiet 10.39 0
## 338 Emden 10.99 0
## 339 Nideggen-Schmidt 10.93 0
## 340 Schönwald/Ofr.-Brunn 8.73 0
## 341 Runkel-Ennerich 11.31 0
## 342 Leipzig-Holzhausen 11.44 0
## 343 Fichtelberg/Oberfranken-Hüttstadl 8.45 0
## 344 Quedlinburg 11.41 0
## 345 Sontra 10.30 0
## 346 Fürstenzell 10.38 0
## 347 Münsingen-Apfelstetten 9.19 0
## 348 Treuen 9.88 0
## 349 Siegsdorf-Höll 9.81 0
## 350 Kaiserslautern 11.69 0
## 351 Kiel-Holtenau 10.50 0
## 352 Harzgerode 9.26 0
## 353 Elpersbüttel 10.39 0
## 354 Frankfurt/Main-Westend 12.68 0
## 355 Lügde-Paenbruch 10.19 0
## 356 Nossen 10.85 0
## 357 Carlsfeld 6.82 1
## 358 Anklam 10.23 0
## 359 Ebersberg-Halbing 10.38 0
## 360 Lüchow 10.94 0
## 361 Markt Erlbach-Hagenhofen 10.28 0
## 362 Hof 9.28 0
## 363 Olsdorf 10.81 0
## 364 Trier-Zewen 12.30 0
## 365 Tann/Rhön 10.19 0
## 366 Saarbrücken-Ensheim 11.35 0
## 367 Baden-Baden-Geroldsau 12.03 0
## 368 Weiden 10.09 0
## 369 Arnstein-Müdesheim 10.97 0
## 370 Bielefeld-Deppendorf 10.97 0
## 371 Lingen-Baccum 11.40 0
## 372 Dürkheim, Bad 12.67 0
## 373 Groß Lüsewitz 10.02 0
## 374 Neuenahr, Bad-Ahrweiler 12.16 0
## 375 Mühlhausen/Thüringen-Görmar 10.64 0
## 376 Sigmaringen-Laiz 9.65 0
## 377 Olbersleben 11.02 0
## 378 Hilgenroth 10.91 0
## 379 Angermünde 10.49 0
## 380 Brocken 4.91 1
## 381 Rottweil 10.08 0
## 382 Diepholz 11.19 0
## 383 Harburg 10.58 0
## 384 Elsendorf-Horneck 10.74 0
## 385 Hamburg-Neuwiedenthal 11.18 0
## 386 Ingolstadt-Manching 10.89 0
## 387 Lüdinghausen-Brochtrup 11.76 0
## 388 Magdeburg 11.69 0
## 389 Buchen, Kr. Neckar-Odenwald 10.61 0
## 390 Wittenborn 10.26 0
## 391 Maisach-Galgen 10.46 0
## 392 Konstanz 11.52 0
## 393 Dachsberg-Wolpadingen 9.45 0
## 394 Leck 9.96 0
## 395 Arnsberg-Neheim 10.70 0
## 396 Schleswig-Jagel 9.97 0
## 397 Attenkam 10.16 0
## 398 Hoyerswerda 11.12 0
## 399 Cottbus 11.32 0
## 400 Boltenhagen 10.31 0
## 401 Köln-Stammheim 12.80 0
## 402 Löhnberg-Obershausen 10.56 0
## 403 Dippoldiswalde-Reinberg 9.91 0
## 404 Bergzabern, Bad 12.71 0
## 405 Simmern-Wahlbach 10.28 0
## 406 Wutöschingen-Ofteringen 10.94 0
## 407 Öhringen 12.11 0
## 408 Fulda-Horas 10.57 0
## 409 Werl 11.82 0
## 410 Ennigerloh-Ostenfelde 11.42 0
## 411 Alfeld 10.78 0
## 412 Greifswalder Oie 10.36 0
## 413 Meßstetten-Appental 8.21 0
## 414 Roth 10.72 0
## 415 Hiddensee-Vitte 10.40 0
## 416 Neu-Ulrichstein 10.26 0
## 417 Weidenbach-Weiherschneidbach 10.19 0
## 418 Waren (Müritz) 10.45 0
## 419 Münster/Osnabrück 11.62 0
## 420 Nienburg 11.20 0
## 421 Falkenberg,Kr.Rottal-Inn 10.73 0
## 422 Groß Berßen 11.20 0
## 423 Rheinfelden 12.33 0
## 424 Fichtelberg 5.21 1
## 425 Lauchstädt, Bad 11.32 0
## 426 Köln/Bonn 12.02 0
## 427 Wielenbach (Demollstr.) 10.33 0
## 428 Neuburg an der Donau 10.73 0
## 429 Ahaus 11.50 0
## 430 Rotenburg (Wümme) 10.79 0
## 431 Rosenheim 10.83 0
## 432 Oschatz 11.23 0
## 433 Eisenach 10.45 0
## 434 Wunsiedel-Schönbrunn 8.92 0
## 435 Ingelfingen-Stachenhausen 11.07 0
## 436 Berlin-Tempelhof 11.55 0
## 437 Regensburg 10.82 0
## 438 Weiskirchen/Saar 11.19 0
## 439 Hohenpeißenberg 9.13 0
## 440 Laage-Kronskamp 10.20 0
## 441 Schipkau-Klettwitz 10.84 0
## 442 Freudenstadt 9.16 0
## 443 Teuschnitz 8.66 0
## 444 Demker 11.11 0
## 445 Nürburg-Barweiler 9.96 0
## 446 Coschen 10.69 0
## 447 Großerlach-Mannenweiler 10.73 0
## 448 Kösching 10.73 0
## 449 Rheinau-Memprechtshofen 12.60 0
## 450 Dresden-Klotzsche 11.21 0
## 451 Geisingen 10.02 0
## 452 Zeitz 11.08 0
## 453 Weingarten, Kr. Ravensburg 10.97 0
## 454 Rheinstetten 12.68 0
Let’s look at the Measures of Central Tendency and Variability from the lecture (starting at slide 17).
Consider the following vector:
example_vec <- c(1, 2, 3, 4, 5)
How could we calculate the mean of example_vec?
We could simply calculate it “by hand”:
(1 + 2 + 3 + 4 + 5) / 5
## [1] 3
But this is not very useful if we look at an actual vector in our data frame, e.g., mean temperature:
weather_data$mean_temp
## [1] 11.14 12.28 11.37 10.22 11.32 11.12 10.73 11.12 9.07 9.83 10.97 10.35
## [13] 10.49 10.06 10.48 10.09 7.69 10.89 10.72 10.39 11.70 10.56 12.43 11.26
## [25] 12.13 10.13 9.90 11.71 10.52 9.95 11.55 10.94 8.83 11.40 10.63 10.55
## [37] 10.51 11.19 9.90 10.70 9.67 12.31 11.44 10.69 10.69 9.83 11.29 10.35
## [49] 10.10 11.60 9.85 11.38 10.17 9.51 10.25 9.42 10.03 10.32 8.31 10.29
## [61] 9.50 11.41 9.73 10.79 10.69 9.40 10.08 7.88 10.26 11.35 12.79 11.12
## [73] 10.37 9.04 8.61 10.71 10.48 10.15 12.02 7.26 11.72 10.60 11.10 10.01
## [85] 10.39 10.34 10.52 8.52 11.59 7.12 8.82 10.50 10.16 10.11 9.75 10.22
## [97] 10.96 12.55 11.27 10.90 11.14 10.87 10.29 10.67 11.14 10.39 11.03 8.85
## [109] 10.78 7.67 10.62 10.37 11.67 10.78 10.70 10.04 8.79 13.14 9.99 10.36
## [121] 11.21 10.66 10.43 12.41 12.09 11.14 12.83 11.66 10.38 10.80 10.26 11.41
## [133] 10.25 10.90 10.90 9.73 11.23 10.58 9.66 10.78 9.89 10.98 10.16 10.43
## [145] 10.88 11.24 10.87 12.24 9.93 9.73 11.37 10.85 10.76 10.23 11.56 12.06
## [157] 8.29 11.23 10.57 12.17 11.04 4.76 10.73 11.79 10.56 10.69 10.53 10.61
## [169] 10.76 7.94 10.61 10.47 11.15 10.49 10.62 11.24 10.64 11.23 12.01 8.71
## [181] 12.45 12.31 10.79 10.14 10.83 10.38 10.74 10.31 9.28 11.03 9.46 10.60
## [193] 10.19 10.56 11.41 8.67 10.92 10.57 10.33 10.75 10.52 10.59 11.64 5.48
## [205] 11.52 10.07 10.56 10.15 11.62 10.98 11.85 10.42 10.05 10.59 10.28 11.32
## [217] 9.71 11.64 9.43 10.10 11.98 11.14 -2.90 10.99 10.09 10.58 11.81 11.15
## [229] 10.01 12.31 10.33 10.35 11.19 11.35 8.57 11.18 9.70 10.11 8.93 11.22
## [241] 12.32 10.30 10.34 10.65 11.31 11.96 11.04 10.22 10.64 10.24 10.03 9.40
## [253] 10.92 11.24 11.08 10.46 11.69 11.16 9.93 9.89 12.94 11.19 10.58 10.30
## [265] 11.08 10.34 9.96 10.49 10.36 10.66 11.05 10.19 11.06 10.47 10.25 10.57
## [277] 10.96 11.05 12.46 9.97 11.38 10.63 11.14 10.23 11.13 11.38 11.83 9.89
## [289] 12.41 11.08 10.90 9.50 10.07 11.67 11.82 8.96 10.38 11.50 10.54 10.72
## [301] 6.88 10.66 9.29 10.58 10.26 12.27 10.23 10.99 10.52 11.10 9.80 11.57
## [313] 10.44 10.82 11.13 10.87 11.18 10.16 10.03 9.46 9.28 10.89 12.83 10.05
## [325] 10.72 12.00 10.59 12.15 10.42 11.68 11.07 7.30 11.32 11.17 10.85 9.84
## [337] 10.39 10.99 10.93 8.73 11.31 11.44 8.45 11.41 10.30 10.38 9.19 9.88
## [349] 9.81 11.69 10.50 9.26 10.39 12.68 10.19 10.85 6.82 10.23 10.38 10.94
## [361] 10.28 9.28 10.81 12.30 10.19 11.35 12.03 10.09 10.97 10.97 11.40 12.67
## [373] 10.02 12.16 10.64 9.65 11.02 10.91 10.49 4.91 10.08 11.19 10.58 10.74
## [385] 11.18 10.89 11.76 11.69 10.61 10.26 10.46 11.52 9.45 9.96 10.70 9.97
## [397] 10.16 11.12 11.32 10.31 12.80 10.56 9.91 12.71 10.28 10.94 12.11 10.57
## [409] 11.82 11.42 10.78 10.36 8.21 10.72 10.40 10.26 10.19 10.45 11.62 11.20
## [421] 10.73 11.20 12.33 5.21 11.32 12.02 10.33 10.73 11.50 10.79 10.83 11.23
## [433] 10.45 8.92 11.07 11.55 10.82 11.19 9.13 10.20 10.84 9.16 8.66 11.11
## [445] 9.96 10.69 10.73 10.73 12.60 11.21 10.02 11.08 10.97 12.68
Typing up all the entries individually would take a lot of time. We could use two functions that we already have seen, sum and length.
sum(weather_data$mean_temp) / length(weather_data$mean_temp)
## [1] 10.56586
Fortunately, R provides a much easier way to calculate a mean:
mean(weather_data$mean_temp) # That was easy.
## [1] 10.56586
But be sure that your vector is numeric. Could you calculate the mean of city?
weather_data$city
## [1] "Sigmarszell-Zeisertsweiler"
## [2] "Obersulm-Willsbach"
## [3] "Röllbach"
## [4] "Padenstedt (Pony-Park)"
## [5] "Elzach-Fisnacht"
## [6] "Lippspringe, Bad"
## [7] "Ummendorf"
## [8] "Tholey"
## [9] "Garmisch-Partenkirchen"
## [10] "Veilsdorf"
## [11] "Wernigerode"
## [12] "Pelzerhaken"
## [13] "Balingen-Bronnhaupten"
## [14] "Kronach"
## [15] "Heckelberg"
## [16] "Kaisersbach-Cronhütte"
## [17] "Kleiner Inselsberg"
## [18] "Starkenberg-Tegkwitz"
## [19] "Schwandorf"
## [20] "Quickborn"
## [21] "Darmstadt"
## [22] "Staffelstein, Bad-Stublang"
## [23] "Geisenheim"
## [24] "Rahden-Kleinendorf"
## [25] "Heinsberg-Schleiden"
## [26] "Eichstätt-Landershofen"
## [27] "Parsberg/Oberpfalz-Eglwang"
## [28] "Perl-Nennig"
## [29] "Warburg"
## [30] "Altheim, Kreis Biberach"
## [31] "Friedrichshafen-Unterraderach"
## [32] "Wangerland-Hooksiel"
## [33] "Lenzkirch-Ruhbühl"
## [34] "Neunkirchen-Wellesweiler"
## [35] "Boizenburg"
## [36] "Leuchtturm Kiel"
## [37] "Rosengarten-Klecken"
## [38] "Artern"
## [39] "Barth"
## [40] "Schlüchtern-Herolz"
## [41] "Neustadt am Kulm-Filchendorf"
## [42] "Düsseldorf"
## [43] "Freudenberg/Main-Boxtal"
## [44] "Weißenburg-Emetzheim"
## [45] "Querfurt-Mühle Lodersleben"
## [46] "Oberhaching-Laufzorn"
## [47] "Wusterwitz"
## [48] "Königshofen, Bad"
## [49] "Ostenfeld (Rendsburg)"
## [50] "Wuppertal-Buchenhofen"
## [51] "Karlshagen"
## [52] "Wolfach"
## [53] "Martinroda"
## [54] "Oberviechtach"
## [55] "Hasenkrug-Hardebek"
## [56] "Waldmünchen"
## [57] "Schorndorf-Knöbling"
## [58] "Blankenrath"
## [59] "Birx/Rhön"
## [60] "Aue"
## [61] "Kaufbeuren-Oberbeuren"
## [62] "Pirmasens"
## [63] "Stötten"
## [64] "Görlitz"
## [65] "Waldems-Reinborn"
## [66] "Pfullendorf"
## [67] "Neubulach-Oberhaugstett"
## [68] "Kleiner Feldberg/Taunus"
## [69] "Trollenhagen"
## [70] "Bernburg/Saale (Nord)"
## [71] "Lahr"
## [72] "Cölbe, Kr. Marburg-Biedenkopf"
## [73] "Steinau, Kr. Cuxhaven"
## [74] "Lobenstein, Bad"
## [75] "Oberstdorf"
## [76] "Göttingen"
## [77] "Mühldorf"
## [78] "Erfde"
## [79] "Königswinter-Heiderhof"
## [80] "Wasserkuppe"
## [81] "Borken in Westfalen"
## [82] "Müncheberg"
## [83] "Bremen"
## [84] "Kiefersfelden-Gach"
## [85] "Grambek"
## [86] "Lichtenhain-Mittelndorf"
## [87] "Erfurt-Weimar"
## [88] "Oberharz am Brocken-Stiege"
## [89] "Trier-Petrisberg"
## [90] "Kahler Asten"
## [91] "Schneifelforsthaus"
## [92] "Chieming"
## [93] "Moringen-Lutterbeck"
## [94] "Stechlin-Menz"
## [95] "Kempten"
## [96] "Wittstock-Rote Mühle"
## [97] "Großenkneten"
## [98] "Müllheim"
## [99] "Möhrendorf-Kleinseebach"
## [100] "Landshut-Reithof"
## [101] "Belm"
## [102] "Klipphausen-Garsebach"
## [103] "Grünow"
## [104] "Michelstadt-Vielbrunn"
## [105] "Potsdam"
## [106] "Weihenstephan-Dürnast"
## [107] "Doberlug-Kirchhain"
## [108] "Zwiesel"
## [109] "Wittingen-Vorhop"
## [110] "Deutschneudorf-Brüderwiese"
## [111] "Sankt Peter-Ording"
## [112] "Marnitz"
## [113] "Michelstadt"
## [114] "Kissingen, Bad"
## [115] "Ruppertsecken"
## [116] "Plauen"
## [117] "Elster, Bad-Sohl"
## [118] "Waghäusel-Kirrlach"
## [119] "Feuchtwangen-Heilbronn"
## [120] "Lennestadt-Theten"
## [121] "Berlin Brandenburg"
## [122] "Muskau, Bad"
## [123] "Waltershausen"
## [124] "Kahl/Main"
## [125] "Geldern-Walbeck"
## [126] "Berlin-Dahlem (FU)"
## [127] "Mannheim"
## [128] "Würzburg"
## [129] "Ueckermünde"
## [130] "Naumburg/Saale-Kreipitzsch"
## [131] "Hermaringen-Allewind"
## [132] "Aachen-Orsbach"
## [133] "Hohwacht"
## [134] "Baruth"
## [135] "Helmstedt-Emmerstedt"
## [136] "Ulm-Mähringen"
## [137] "Hannover"
## [138] "Altomünster-Maisbrunn"
## [139] "Eslohe"
## [140] "Fritzlar/Eder"
## [141] "Feldberg/Mecklenburg"
## [142] "Leuchtturm Alte Weser"
## [143] "Greifswald"
## [144] "Idar-Oberstein"
## [145] "Krölpa-Rockendorf"
## [146] "Schwäbisch Gmünd-Weiler"
## [147] "Lenzen/Elbe"
## [148] "Andernach"
## [149] "Tribsees"
## [150] "Schleiz"
## [151] "Mühlacker"
## [152] "Hümmerich"
## [153] "Dillingen/Donau-Fristingen"
## [154] "Dörnick"
## [155] "Pforzheim-Ispringen"
## [156] "Bochum"
## [157] "Braunlage"
## [158] "Dörpen"
## [159] "Amberg-Unterammersricht"
## [160] "Sachsenheim"
## [161] "Seehausen"
## [162] "Großer Arber"
## [163] "Lohr/Main-Halsbach"
## [164] "Eppingen-Elsenz"
## [165] "Oberzent-Beerfelden"
## [166] "Reichshof-Eckenhagen"
## [167] "Neuburg/Kammel-Langenhaslach"
## [168] "Schwerin"
## [169] "Weimar-Schöndorf"
## [170] "Wernigerode-Schierke"
## [171] "Geringswalde-Altgeringswalde"
## [172] "Schmieritz-Weltwitz"
## [173] "Helgoland"
## [174] "Gottfrieding"
## [175] "Kirchdorf/Poel"
## [176] "Berus"
## [177] "Trostberg"
## [178] "Dachwig"
## [179] "Metzingen"
## [180] "Marienberg"
## [181] "Duisburg-Baerl"
## [182] "Stuttgart (Schnarrenberg)"
## [183] "Hechingen"
## [184] "Hahn"
## [185] "Reimlingen"
## [186] "Mallersdorf-Pfaffenberg-Oberlindhart"
## [187] "Hersfeld, Bad"
## [188] "Itzehoe"
## [189] "Merklingen"
## [190] "Lübben-Blumenfelde"
## [191] "Prackenbach-Neuhäusl"
## [192] "Kirchberg/Jagst-Herboldshausen"
## [193] "Ebrach"
## [194] "München-Flughafen"
## [195] "Jeßnitz"
## [196] "Reit im Winkl"
## [197] "Wiesbaden-Auringen"
## [198] "Wendisch Evern"
## [199] "Wacken"
## [200] "Hamburg-Fuhlsbüttel"
## [201] "Donauwörth-Osterweiler"
## [202] "Bremervörde"
## [203] "Buchenbach"
## [204] "Feldberg/Schwarzwald"
## [205] "Köthen (Anhalt)"
## [206] "Heinersreuth-Vollhof"
## [207] "Zehdenick"
## [208] "Gräfenberg-Kasberg"
## [209] "Wunstorf"
## [210] "Berge"
## [211] "Kitzingen"
## [212] "Teterow"
## [213] "Manderscheid-Sonnenhof"
## [214] "Renningen-Ihinger Hof"
## [215] "Twistetal-Mühlhausen"
## [216] "Bevern, Kr. Holzminden"
## [217] "Meiningen"
## [218] "Kleve"
## [219] "Sohland/Spree"
## [220] "Langenwetzendorf-Göttendorf"
## [221] "Weilerswist-Lommersum"
## [222] "Osterfeld"
## [223] "Zugspitze"
## [224] "Friesoythe-Altenoythe"
## [225] "Leinefelde"
## [226] "Freiburg/Elbe"
## [227] "Waltrop-Abdinghof"
## [228] "Salzuflen, Bad"
## [229] "Berka, Bad (Flugplatz)"
## [230] "Tönisvorst"
## [231] "Chemnitz"
## [232] "Goldberg"
## [233] "Nürnberg"
## [234] "Barsinghausen-Hohenbostel"
## [235] "Berleburg, Bad-Stünzel"
## [236] "Norderney"
## [237] "Kall-Sistig"
## [238] "Lüdenscheid"
## [239] "Klippeneck"
## [240] "Braunschweig"
## [241] "Saarbrücken-Burbach"
## [242] "Alsfeld-Eifa"
## [243] "Schwarzburg"
## [244] "Wiesenburg"
## [245] "Leipzig/Halle"
## [246] "Nauheim, Bad"
## [247] "Harzburg, Bad"
## [248] "Grambow-Schwennenz"
## [249] "Uelzen"
## [250] "Amerang-Pfaffing"
## [251] "Holzkirchen"
## [252] "Villingen-Schwenningen"
## [253] "Bamberg"
## [254] "Wittenberg"
## [255] "Möckern-Drewitz"
## [256] "Fehmarn"
## [257] "Lippstadt-Bökenförde"
## [258] "Singen"
## [259] "Arkona"
## [260] "Meinerzhagen-Redlendorf"
## [261] "Freiburg"
## [262] "Gevelsberg-Oberbröking"
## [263] "Lichtentanne"
## [264] "Schauenburg-Elgershausen"
## [265] "Schonungen-Mainberg"
## [266] "Lautertal-Oberlauter"
## [267] "Pommelsbrunn-Mittelburg"
## [268] "Deuselbach"
## [269] "Herzberg"
## [270] "Aldersbach-Kramersepp"
## [271] "Lindenberg"
## [272] "Gelbelsee"
## [273] "Alfhausen"
## [274] "Augsburg"
## [275] "Metten"
## [276] "Kyritz"
## [277] "Gardelegen"
## [278] "Eschwege"
## [279] "Frankfurt/Main"
## [280] "Memmingen"
## [281] "Klitzschen bei Torgau"
## [282] "Straubing"
## [283] "Wolfsburg (Südwest)"
## [284] "Burgwald-Bottendorf"
## [285] "Kubschütz, Kr. Bautzen"
## [286] "Notzingen"
## [287] "Essen-Bredeney"
## [288] "Saldenburg-Entschenreuth"
## [289] "Worms"
## [290] "Bassum"
## [291] "Manschnow"
## [292] "Neukirchen-Hauptschwenda"
## [293] "Schleswig"
## [294] "Jena (Sternwarte)"
## [295] "Waibstadt"
## [296] "Oy-Mittelberg-Petersthal"
## [297] "Lübeck-Blankensee"
## [298] "Neunkirchen-Seelscheid-Krawinkel"
## [299] "Neuruppin-Alt Ruppin"
## [300] "Seesen"
## [301] "Zinnwald-Georgenfeld"
## [302] "Mittelnkirchen-Hohenfelde"
## [303] "Tirschenreuth-Lodermühl"
## [304] "Soltau"
## [305] "Piding"
## [306] "Emmendingen-Mundingen"
## [307] "Hattstedt"
## [308] "Berlin-Buch"
## [309] "Ellwangen-Rindelbach"
## [310] "Genthin"
## [311] "Putbus"
## [312] "München-Stadt"
## [313] "Salzungen, Bad-Gräfen-Nitzendorf"
## [314] "Langenlipsdorf"
## [315] "Aschersleben-Mehringen"
## [316] "Rothenburg ob der Tauber"
## [317] "Holzdorf-Bernsdorf"
## [318] "Schönhagen (Ostseebad)"
## [319] "Sandberg"
## [320] "Leutkirch-Herlazhofen"
## [321] "Grainet-Rehberg"
## [322] "Simbach/Inn"
## [323] "Ohlsbach"
## [324] "Bertsdorf-Hörnitz"
## [325] "Worpswede-Hüttenbusch"
## [326] "Schaafheim-Schlierbach"
## [327] "Gera-Leumnitz"
## [328] "Offenbach-Wetterpark"
## [329] "Günzburg"
## [330] "Dresden-Hosterwitz"
## [331] "Cuxhaven"
## [332] "Neuhaus am Rennweg"
## [333] "Hameln-Hastenbeck"
## [334] "Borkum-Flugplatz"
## [335] "Rostock-Warnemünde"
## [336] "Steinhagen-Negast"
## [337] "Weinbiet"
## [338] "Emden"
## [339] "Nideggen-Schmidt"
## [340] "Schönwald/Ofr.-Brunn"
## [341] "Runkel-Ennerich"
## [342] "Leipzig-Holzhausen"
## [343] "Fichtelberg/Oberfranken-Hüttstadl"
## [344] "Quedlinburg"
## [345] "Sontra"
## [346] "Fürstenzell"
## [347] "Münsingen-Apfelstetten"
## [348] "Treuen"
## [349] "Siegsdorf-Höll"
## [350] "Kaiserslautern"
## [351] "Kiel-Holtenau"
## [352] "Harzgerode"
## [353] "Elpersbüttel"
## [354] "Frankfurt/Main-Westend"
## [355] "Lügde-Paenbruch"
## [356] "Nossen"
## [357] "Carlsfeld"
## [358] "Anklam"
## [359] "Ebersberg-Halbing"
## [360] "Lüchow"
## [361] "Markt Erlbach-Hagenhofen"
## [362] "Hof"
## [363] "Olsdorf"
## [364] "Trier-Zewen"
## [365] "Tann/Rhön"
## [366] "Saarbrücken-Ensheim"
## [367] "Baden-Baden-Geroldsau"
## [368] "Weiden"
## [369] "Arnstein-Müdesheim"
## [370] "Bielefeld-Deppendorf"
## [371] "Lingen-Baccum"
## [372] "Dürkheim, Bad"
## [373] "Groß Lüsewitz"
## [374] "Neuenahr, Bad-Ahrweiler"
## [375] "Mühlhausen/Thüringen-Görmar"
## [376] "Sigmaringen-Laiz"
## [377] "Olbersleben"
## [378] "Hilgenroth"
## [379] "Angermünde"
## [380] "Brocken"
## [381] "Rottweil"
## [382] "Diepholz"
## [383] "Harburg"
## [384] "Elsendorf-Horneck"
## [385] "Hamburg-Neuwiedenthal"
## [386] "Ingolstadt-Manching"
## [387] "Lüdinghausen-Brochtrup"
## [388] "Magdeburg"
## [389] "Buchen, Kr. Neckar-Odenwald"
## [390] "Wittenborn"
## [391] "Maisach-Galgen"
## [392] "Konstanz"
## [393] "Dachsberg-Wolpadingen"
## [394] "Leck"
## [395] "Arnsberg-Neheim"
## [396] "Schleswig-Jagel"
## [397] "Attenkam"
## [398] "Hoyerswerda"
## [399] "Cottbus"
## [400] "Boltenhagen"
## [401] "Köln-Stammheim"
## [402] "Löhnberg-Obershausen"
## [403] "Dippoldiswalde-Reinberg"
## [404] "Bergzabern, Bad"
## [405] "Simmern-Wahlbach"
## [406] "Wutöschingen-Ofteringen"
## [407] "Öhringen"
## [408] "Fulda-Horas"
## [409] "Werl"
## [410] "Ennigerloh-Ostenfelde"
## [411] "Alfeld"
## [412] "Greifswalder Oie"
## [413] "Meßstetten-Appental"
## [414] "Roth"
## [415] "Hiddensee-Vitte"
## [416] "Neu-Ulrichstein"
## [417] "Weidenbach-Weiherschneidbach"
## [418] "Waren (Müritz)"
## [419] "Münster/Osnabrück"
## [420] "Nienburg"
## [421] "Falkenberg,Kr.Rottal-Inn"
## [422] "Groß Berßen"
## [423] "Rheinfelden"
## [424] "Fichtelberg"
## [425] "Lauchstädt, Bad"
## [426] "Köln/Bonn"
## [427] "Wielenbach (Demollstr.)"
## [428] "Neuburg an der Donau"
## [429] "Ahaus"
## [430] "Rotenburg (Wümme)"
## [431] "Rosenheim"
## [432] "Oschatz"
## [433] "Eisenach"
## [434] "Wunsiedel-Schönbrunn"
## [435] "Ingelfingen-Stachenhausen"
## [436] "Berlin-Tempelhof"
## [437] "Regensburg"
## [438] "Weiskirchen/Saar"
## [439] "Hohenpeißenberg"
## [440] "Laage-Kronskamp"
## [441] "Schipkau-Klettwitz"
## [442] "Freudenstadt"
## [443] "Teuschnitz"
## [444] "Demker"
## [445] "Nürburg-Barweiler"
## [446] "Coschen"
## [447] "Großerlach-Mannenweiler"
## [448] "Kösching"
## [449] "Rheinau-Memprechtshofen"
## [450] "Dresden-Klotzsche"
## [451] "Geisingen"
## [452] "Zeitz"
## [453] "Weingarten, Kr. Ravensburg"
## [454] "Rheinstetten"
Let’s try to calculate the mean.
mean(weather_data$city)
## Warning in mean.default(weather_data$city): argument is not numeric or logical:
## returning NA
## [1] NA
It does not work! And even by hand we could not calculate the mean of character valued vectors.
Here is an overview over functions for measures of centrality and variability:
mean()median()var()sd()range()IQR()You can try them out here:
# Median
median(weather_data$mean_temp)
## [1] 10.64
# Variance
var(weather_data$mean_temp)
## [1] 1.61514
# Standard deviation
sd(weather_data$mean_temp)
## [1] 1.270882
# Range
range(weather_data$mean_temp)
## [1] -2.90 13.14
# Inter Quartile Range (IQR)
IQR(weather_data$mean_temp)
## [1] 1.015
Unfortunately, there is no direct function to get the mode. The solutions you will find online are all a bit advanced. So the easiest solution is to look for the mode using a frequency table.
table(weather_data$cold)
##
## 0 1
## 440 14
The table() function shows you how often each value is
in the vector. You can now identify the most frequent value.
Now we will work with the weather_data data set. It is
already loaded for you and you can use it right away.
Show the variable mean_temp if it is over
10.
Generate a new variable and call it hot that is zero
for mean temperature < 10 and 1 for mean
temperature > 10 degree Celsius.
Have a look at your data set.
Please solve all three steps in the next code chunk.
This is a little trickier: Can you find the hottest and coldest city in Germany 2023?
Hint: The functions min() and max() help
you to find the minimum and maximum values of a vector or variable.
Combine that with your newly learned subsetting skills and you’ll find
the answer.
We will continue working with the weather data set
Calculate the mean value of latitude and save the result as
mean_latitude.
Calculate the variance of latitude and save the result as
var_latitude.
Calculate the standard deviation of latitude and save the result
as sd_latitude.
Let’s have a short look at our data again. Remember:
head() shows you the first six entries of your data. It is
very useful to get a look at the data structure when you have a lot of
rows in your dataset.
head(weather_data)
## city longitude latitude mean_temp cold
## 1 Sigmarszell-Zeisertsweiler 9.740446 47.57760 11.14 0
## 2 Obersulm-Willsbach 9.352493 49.12801 12.28 0
## 3 Röllbach 9.253038 49.76440 11.37 0
## 4 Padenstedt (Pony-Park) 9.925507 54.01884 10.22 0
## 5 Elzach-Fisnacht 8.108840 48.20121 11.32 0
## 6 Lippspringe, Bad 8.838795 51.78542 11.12 0
Now we can create a simple scatterplot:
plot(
x = weather_data$longitude,
y = weather_data$mean_temp
)
To get a nicer plot, we can adjust many things. We suggest to always explicitly make those adjustments in the same order.
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (South - North)", # This labels the x-axis.
ylab = "Mean Temperature in 2023", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = "black", # What color should the symbols have?
frame = F # No box around the plot.
)
We can also adjust the colors. Let’s highlight Mannheim!
Pro Tip: To color up your data visualizations, use the viridis-package.
Viridis colors make it easier to read by those with colorblindness and print well in greyscale. You probably don’t want to have plots like this:
We first need a vector that gives us the right colors with respect to the city variable.
library(viridis)
## Loading required package: viridisLite
# we need two colors, this is how we get them:
two_colors <- viridis(2)
two_colors # these are so-called HEX color codes
## [1] "#440154FF" "#FDE725FF"
# we use the first color for males and the second for females
mannheim_color <- ifelse(weather_data$city == "Mannheim", two_colors[1], two_colors[2])
# let's have a look:
table(mannheim_color)
## mannheim_color
## #440154FF #FDE725FF
## 1 453
Now we can use this vector to specify the color respective to Mannheim:
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (South - North)", # This labels the x-axis.
ylab = "Mean Temperature in 2023", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
Now that we use different colors, we also need a legend to know which color is which.
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (South - North)", # This labels the x-axis.
ylab = "Mean Temperature in 2023", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
legend(
"bottomleft", # Locate the legend in the topleft corner.
legend = c("Mannheim", "other"), # Give it labels.
pch = 19, # Specify symbols as in the scatterplot.
col = two_colors, # Specify colors.
bty = "n" # No box around the legend.
)
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (South - North)", # This labels the x-axis.
ylab = "Mean Temperature in 2023", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
# we want to label the point that refers to Mannheim
# We can do that with the text() function,
# But we need to subset the data, so that only Mannheim gets labelled,
# and no other city
text(
x = weather_data$longitude[weather_data$city == "Mannheim"], # subset Mannheim
y = weather_data$mean_temp[weather_data$city == "Mannheim"], # subset Mannheim
labels = "Mannheim", # label Mannheim as "Mannheim"
pos = 4 # position the label right to the point
)
Now we want to visualize mean temperature with a histogram. This is how you get a standard histogram:
hist(x = weather_data$mean_temp) # That's intuitive, but does not look too great
Again, we can adjust many things to make it nicer.
hist(
x = weather_data$mean_temp, # For a histogram we only specify x.
breaks = 50, # specify the number of bins
main = "A Histogram",
xlab = "Mean temperature in degree Celsius",
ylab = "Number of observations",
las = 1, # shift the y-axis labels
col = viridis(1), # One color only (first color from viridis)
border = "white" # That's the color of the bin borders.
)
We can also create density plots.
plot(
density(weather_data$mean_temp), # density() takes care of x, y and type.
main = "A Simple Density Plot",
xlab = "Mean temperature in degree Celsius",
ylab = "", # The y-axis is not really meaningful here.
col = viridis(1),
lwd = 2, # Control the width of the line
frame = F,
yaxt = "n" # Remove the y-axis.
)
And we can also fill the are underneath the curve:
plot(
density(weather_data$mean_temp), # density() takes care of x, y and type.
main = "A Simple Density Plot",
xlab = "Mean temperature in degree Celsius",
ylab = "", # The y-axis is not really meaningful here.
col = viridis(1),
lwd = 2, # Control the width of the line
frame = F,
yaxt = "n" # Remove the y-axis.
)
polygon(density(weather_data$mean_temp),
col = viridis(1, alpha = 0.5) # same color but 50% transparent
)
boxplot(
x = weather_data$mean_temp, # As for histograms we only specify x.
main = "Boxplot of Mean temperature in degree Celsius",
ylab = "Mean temperature in degree Celsius",
las = 1,
col = plasma(1),
frame = F
)
Or a horizontal boxplot.
boxplot(
x = weather_data$mean_temp,
horizontal = T, # With horizontal = T we rotate the boxplot.
main = "Horizontal Boxplot of Mean temperature in degree Celsius",
xlab = "Mean temperature in degree Celsius",
las = 1,
frame = F
)
You learned in the lecture that boxplots have some disadvantages.
Violin plots are a very nice alternative!
This is how you get them:
library(vioplot)
## Loading required package: sm
## Package 'sm', version 2.2-6.0: type help(sm) for summary information
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
vioplot(
x = weather_data$mean_temp,
horizontal = T, # With horizontal = T we rotate the boxplot.
main = "Horizontal Violinplot of Mean temperature in degree Celsius",
xaxt = "n",
xlab = "Mean temperature in degree Celsius",
bty = "n",
axes = FALSE,
names = "",
border = NA
)
Okay, last round of exercises for today:
Make a histogram of the latitude variable.
Make the plot nice looking (Name the axes, main title, colors…)
What we learned in this session:
The first lab session and this script should equip you with all the tools (and lines of code) to tackle the first homework assignment.
Copy the lines of code that worked for something similar. Then, adjust the code according to your problem.
Substantially, in your homework you will inspect a data set on US presidential elections. You will calculate some measures of central tendency and variability. Finally, you will produce some nice plots.
It is best to get started with your homework as soon as possible (after it was handed out on Friday).
Try to write the R Code first. We will provide you a
.Rmd template to hand in your results.
In order to pass the homework assignment you need to tackle ALL problems of a problem set. For a pass you also need to get most of the problems right (or at least show us that you tried everything to get it right.)
If you have any questions concerning the lecture or the tutorial please post them on Slack. We will answer them on a regular basis.
Do not hesitate to come to the office hours!
And always remember if you have a question, it is never a stupid question. In fact most of your fellow students probably have the same or a similar question. By asking it, everyone in this class will profit.